A parallel approximate SS-ELM algorithm based on MapReduce for large-scale datasets

نویسندگان

  • Cen Chen
  • Kenli Li
  • Aijia Ouyang
  • Keqin Li
چکیده

Extreme Learning Machine (ELM) algorithm not only has gained much attention of many scholars and researchers, but also has been widely applied in recent years especially when dealing with big data because of its better generalization performance and learning speed. The proposal of SS-ELM (semi-supervised Extreme Learning Machine) extends ELM algorithm to the area of semi-supervised learning which is an important issue of machine learning on big data. However, the original SS-ELM algorithm needs to store the data in thememory before processing it, so that it could not handle large and web-scale data sets which are of frequent appearance in the era of big data. To solve this problem, this paper firstly proposes an efficient parallel SS-ELM (PSS-ELM) algorithm on MapReduce model, adopting a series of optimizations to improve its performance. Then, a parallel approximate SS-ELM Algorithm based onMapReduce (PASS-ELM) is proposed. PASS-ELM is based on the approximate adjacent similarity matrix (AASM) algorithm, which leverages the Locality-Sensitive Hashing (LSH) scheme to calculate the approximate adjacent similarity matrix, thus greatly reducing the complexity and occupied memory. The proposed AASM algorithm is general, because the calculation of the adjacent similarity matrix is the key operation in many other machine learning algorithms. The experimental results have demonstrated that the proposed PASS-ELM algorithm can efficiently process very large-scale data sets with a good performance, without significantly impacting the accuracy of the results. © 2017 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel extreme learning machine for regression based on MapReduce

Regression is one of the most basic problems in data mining. For regression problem, extreme learning machine (ELM) can get better generalization performance at a much faster learning speed. However, the enlarging volume of datasets makes regression by ELM on very large scale datasets a challenging task. Through analyzing the mechanism of ELM algorithm, an efficient parallel ELM for regression ...

متن کامل

Efficient Batch Parallel Online Sequential Extreme Learning Machine Algorithm Based on MapReduce

With the development of technology and the widespread use of machine learning, more and more models need to be trained to mine useful knowledge from large scale data. It has become a challenging problem to train multiple models accurately and efficiently so as to make full use of limited computing resources. As one of ELM variants, online sequential extreme learning machine (OS-ELM) provides a ...

متن کامل

ELM-Based Distributed Cooperative Learning Over Networks

This paper investigates distributed cooperative learning algorithms for data processing in a network setting. Specifically, the extreme learning machine (ELM) is introduced to train a set of data distributed across several components, and each component runs a program on a subset of the entire data. In this scheme, there is no requirement for a fusion center in the network due to e.g., practica...

متن کامل

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Factor Modeling for Advertisement Targeting

We adapt a probabilistic latent variable model, namely GaP (Gamma-Poisson) [6], to ad targeting in the contexts of sponsored search (SS) and behaviorally targeted (BT) display advertising. We also approach the important problem of ad positional bias by formulating a one-latent-dimension GaP factorization. Learning from click-through data is intrinsically large scale, even more so for ads. We sc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Parallel Distrib. Comput.

دوره 108  شماره 

صفحات  -

تاریخ انتشار 2017